Methods for the Classification of Data from Open-Ended Questions in Surveys
Disputation 16 April 2024
Camille Landesvatter
University of Mannheim
Research Questions and Motivation
Which methods can we use to classify data from open-ended survey questions?
Can we leverage these methods to make empirical contributions to substantial questions?
Motivation:
➡️ The increase in methods to collect natural language (e.g., smartphone surveys and voice technologies) calls for testing and validating automated methods to analyze the resulting data.
➡️ Open-ended survey answers pose a unique challenge for ML applications due to their shortness and lack of context. An effective analysis might require the use of suitable methods, e.g., word embeddings, structural topic models.
Characteristics of Open-Ended Survey Answers
Figure 1: The previous question was: ‘How often can you trust the federal government in Washington to do what is right?’. Your answer was: ‘[Always; Most of the time; About half of the time; Some of the time; Never; Don’t Know]’. In your own words, please explain why you selected this answer.
Structure and Approach of the dissertation
“[…] introducing various methods of classifying data from open-ended survey questions and empirically illustrating their application. A central research question addressed in this thesis therefore concerns the analysis of (short) text data generated by open-ended survey questions.”
introducing readers to the survey methodology of using open-ended questions
including historical and modern developments, characteristics and challenges of open-ended questions, types of OEQs (e.g., probing)
introducing readers to computational methods available for analysis of open-ended answers
manual, semi-automated, fully automated
applying various of these available methods in three empirical studies
Methods for Analyzing Data from Open-Ended Questions
Table 1. Overview of methods for classifying open-ended survey responses
Motivation: Why Computational Methods?
fully manual methods require high ressources (time and effort)
but more importantly, human codings can
be biased (Mosca et al., 2022),
lack objectivity (Inui et al., 2001),
introduce errors when coders misinterpret answers or annotation codes (Giorgetti & Sebastiani, 2003),
face transparency issues related to unitization and intercoder reliability (Campbell et al., 2013).
automated methods offer objectivity and systematicness (Zhang et al., 2022)
still, issues persist (e.g., transparency) which makes it crucial to test and evaluate methods for the social sciences
Studies
Overview
Study 1
Study 2
Study 3
How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning
Open-ended survey questions: A comparison of information content in text and audio response format
Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?
data collection approach: three self-administered web surveys with open-ended questions
data from three U.S. non-probability samples
methodology for text classification: supervised ML, unsupervised ML, fine-tuning of pre-trained language model BERT, zero-shot learning
How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning
Co-authored by: Dr. Paul C. Bauer
Published In: Landesvatter, C., & Bauer, P. C. (2024). How Valid Are Trust Survey Measures? New Insights From Open-Ended Probing Data and Supervised Machine Learning. Sociological Methods & Research, 0(0). https://doi.org/10.1177/00491241241234871
The validity of trust survey measures: Background
Background:
ongoing debates about which type of trust survey researchers are measuring with traditional survey items (i.e., equivalence debate cf. Bauer & Freitag 2018)
Research Question:
How valid are traditional trust survey measures?
Experimental Design:
block randomized question order where closed-ended questions are followed by open-ended follow-up probing questions
The validity of trust survey measures: Methodology
Operationalization via two classifications: share of known vs. unknown others in associations (I), sentiment (pos-neu-neg) of assocations (II)
Supervised classification approach:
manual labeling of randomly sampled documents (n=[1,000,1,500])
fine-tuning the weights of two BERT1 models (base model uncased version), using the manually coded data as training data, to classify the remaining=[6,500/6,000]
accuracy2: 87% (I) and 95% (II)
The validity of trust survey measures: Results
Figure 1: Illustration of exemplary data.
Figure 2: Associations and trust scores across different measures.
Open-ended survey questions: A comparison of information content in text and audio response formats
Co-authored by: Dr. Paul C. Bauer
Submitted to: Public Opinion Quarterly in February 2024
Information content Text vs. Audio Responses: Background
Background:
recent increase of voice-based response options in surveys due to mobile devices equipped with voice input technologies, smartphone surveys and speech-to-text technologies
Research Question:
Are there differences in information content between responses given in voice and text formats?
Experimental Design:
block randomized question order with open-ended and probing questions
random assignment into either the text or voice condition
Information content Text vs. Audio Responses: Methodology
Operationalization via application of measures from information theory and machine learning to classify open-ended survey answers
number of topics, response entropy
plus, response length
Information content Text vs. Audio Responses: Results
Figure 3: Information Content Measures across questions.
Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?
Co-authored by: Dr. Paul C. Bauer
Submitted to: American Political Science Review in March 2024
Affective Components in Political Trust: Background
Background:
conventional notion stating tha trust originates from informed, rational, and consequential judgments is challenged by the idea of an “affective-based” form of (political) trust
Research Question:
Are individual trust judgments in surveys driven by affective rationales?
Questionnaire Design:
closed-ended political trust question followed by open-ended probing question
Affective Components in Political Trust: Methodology
Operationalization via sentiment and emotion analysis
Transcript-based
pysentimiento for sentiment recognition (Pérez et al. 2023)
zero-shot prompting with GPT-3.5
Speech-based
SpeechBrain for Speech Emotion Recognition (Ravanelli et al. 2021)
Affective Components in Political Trust: Results
Figure 6: Results from Speech Emotion Recognition.
Summary
web surveys can be used to collect narrative answers that provide valuable insights into survey responses
various modern developments (smartphone surveys, speech-to-text algorithms) can be leveraged to collect such data in innovative ways (e.g., spoken answers)
always consider challenges and objectives (i.e., in term of sample sizes and sample compositions)
computational measures can be applied to classify open-ended answers from surveys in order to inform ongoing debates in different fields, e.g.:
Study 1: equivalence debate in trust research (Study 1), Study 3: cognitive-versus-affective debate in political trust research (Study 3)
survey questionnaire design (Study 2) or item and data quality in general (e.g., associations, sentiment, emotions) (Study 1-3)
Conclusion
Machine Learning and Open-Ended Answers
Facilitated accessibility and implementation of semi-automated methods.
supervised models have been a standard in automated methods, but recent developments of large and general-aim pre-trained models (e.g., BERT) allow less resource-intensive fine-tuning
For example, using only ~13% (1,000 documents from 7,500 in Study 1) documents for fine-tuning resulted in sufficient accuracy (i.e., 87%)
increasing the number of manually labeled documents can help in terms of accuracy (i.e., 92% in Study 1)
higher number of manual examples also improves the transparency of results: accuracy vs. transparency trade-off
start with simple methods and evaluate (e.g., Study 1, first Random Forest, only later BERT)
Conclusion
Machine Learning and Open-Ended Answers
Increase in possibilities of fully automated methods (e.g., prompt engineering.
fully automated methods, such as zero-shot prompting can keep up with fine-tuned versions of pre-trained models (e.g., pysentimiento, Study 2)
deciding on a suitable number of manual examples and for a method in general (e.g., fully automated (unsupervised) versus semi-automated (supervised, and finetuning) depends on resources such as expected difficulty, desired accuracy, available time and cost resources